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Identifying breast cancer utilizing artificial intelligence technologies is 
valuable and has a great influence on the early detection of diseases. It also 
can save humanity by giving them a better chance to be treated in the earlier 
stages of cancer. During the last decade, deep neural networks (DNN) and 
machine learning (ML) systems have been widely used by almost every 
segment in medical centers due to their accurate identification and recognition 
of diseases, especially when trained using many datasets/samples. in this 
paper, a proposed two hidden layers DNN with a reduction in the number of 
additions and multiplications in each neuron. The number of bits and binary 
points of inputs and weights can be changed using the mask configuration on 
each subsystem to futher reduce the hardware requirements. The DNN was 
designed using a system generator and implemented using very hardware 
description language (VHDL). The system achievments outcomes the 
superior’s accuracy rate of approximately 99.6 percent in distinguishing 


Xilinx system generator bengin from malignant tissue. Also, the hardware resources were reduced by 
30 percent from works of literature with an error rate of 7e-4 when using the 


Kintex-7 xc7k325t-3fbg676 board. 
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1. INTRODUCTION 

Mammography is the traditional way for early detection of breast cancer, which increases the patient's 
chance to beat cancer [1]. Another way is to use computerized methods under two conditions: correctly and 
timely. The hardware implementation of deep neural network (DNN) must ensure the efficient diagnosis of 
breast cancer with minimum possible hardware requirement on the field programmable gate array (FPGA) [2]. 
Xilinx company provides tools to work with programmable hardware using Xilinx system generator (XSG) to 
implement the DNN [3]-[6]. XSG will convert a block diagram implemented in Simulink into a very hardware 
description language (VHDL) that can be used to program an FPGA board [3], [7]-[9]. The performance of 
the DNN classification can be evaluated using accuracy tests. The familiar statistics used are true positive (TP), 
false positive (FP), true negative (TN), false negative (FN) terms. TP is the number of ‘positive’ classes that 
are correctly classified as positive. FP is the number of ‘positive’ classes that are incorrectly classified as 
positive and should be in the negative class. TN is the number of ‘negative’ classes that are correctly classified 
as negative. Finally, FN refers to the number of ‘negative’ classes that are classified as positive [9], [10]. These 
statistics are used to calculate some common performance metrics such as accuracy and precision [11]: 
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TP+TN 


Accuracy = ae (1) 
Precision = —— (2) 
TP+FP 


To determine the correctness rate of benign and malignant detection, the receiver operating 
characteristic (ROC) curve method was used [12], [13]. ROC graphs are useful to organize and visualize 
classifiers' performance. ROC is used in decision-making, machine learning (ML), and data mining (DM) in 
medical research. The tradeoff between TP and FP is declared using the ROC curve [11], [14]. DNN uses 
multiple hidden layers, where the first layer represents the input layer, the final layer represents the output 
layer, and all the intermediate layers represent the hidden layers. Each layer contains nodes called neuron that 
passes the information through layers, starting from a neuron in the input layer passing through hidden layer 
nodes, and finally the output layer nodes [15]—[19]. The product of inputs and weights are added to bais at each 
node. Then the output is passed through the activation function. This function works based on thresholding. 
When the node value or set of nodes passes the threshold, the nodes’ values are passed to the next layer. The 
equation that describes the proposed DNN model in this paper: 


input 


Yi2 =F] Dee we F | D2. (zi [. et bast) + bias2 | + bias3 (3) 


hidden layer 1 
hidden layer 2 


output layer 


One of the activation functions used with NN is the sigmoid function [4], [20]: 


f(x) =c+bx +ax? (4) 
1 1 1 
Where = treo’ 9 teen’? 9 Theme 
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V2 
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To benefit work similar to human nature in making decisions based on expert opinion, DNN should 
be trained on a large number of training data [18]. Xilinx provided a Simulink library that works the same as 
the other elements of Simulink by providing many blocks doing different functions. VHDL or Verilog code is 
generated from these blocks by implementing the hardware architecture using resources integrated 
with Simulink and also .m files MATLAB. Once completed, XSG automatically generates the VHDL 
code [15]-[17]. The integrated synthesis environment (ISE) tool is used with XSG to generate the bitstream 
code. Figure | represents these steps. 

Several researchers used neural network (NN) as a tool to detect breast cancer instead of traditional 
biological ways for efficient and timely detection. Aguiar et.al. [3] built a single neuron artificial neural netwok 
(ANN) to test the newly designed hyperbolic tangent activation function. The results showed that using 16 bits 
is more convenient than other tested solutions. Khodja et.al. [4] implemented a single neuron ANN on FPGA 
using XSG. Sigmoid function designed using the polynomial form. The results showed that ANN occupied 
63% of the used Spartan3 board. Sepandi ez. al. [9] created an ANN model to accurately predict breast cancer 
risk in patients using datasets that include monographic results and other risk factors. Due to the limitation of 
registered data, their ANN can support only decision-making. Ayer et.al. [12] evaluated the effect of using 
large datasets in training ANN to discriminate between benign and malignant diseases. The authors used 
feedforward NN with three layers having 1,000 neurons in hidden layers. The results showed success in 
detecting benign accurately and predicting risks of breast cancer abnormality. Janghel et.al. [21] developed an 
ANN system to classify breast cancer. Four models of NN have been implemented: back propagation algorithm 


Reduced hardware requirements of deep neural network for breast cancer diagnosis (Yasmine M. Tabra) 


1364 O ISSN: 2252-8938 


(BPA), radial basis function networks (RBF), learning vector quantization (LVQ), and competitive learning 
network. The Back propagation model with a single hidden layer having 25 neurons showed the best results. 


Algorithm files Matlab Simulink 
based .m developer files .mdl 


+ Pa 
MATLAB 
SIMULINK® 


Automatic generation 
of VHDL code 


Figure 1. Xilinx system generator design steps 


Singhal and Pareek [22] created a backpropagation ANN with 6 neurons in the hidden layer. It can 
early predict breast cancer using the Wisconsin Breast Cancer (Diagnostic) Dataset. This database contains 
372 records. The ANN was implemented using C-language. Shukla et.al. [23] trained three types of ANN 
namely, BPA, RBF, and LVQ network. BPA showed the best accuracy of other networks with 40 hidden 
neurons, while RBFN showed the least training time. Bharati et.al. [24] reviewed the advantages and 
limitations of ANN literature models for breast cancer diagnosis via mammography. Then described the 
preprocessing methods used with ANN. Desai and Shah [25] reviewed the employment of several NN types 
for breast cancer diagnosis. A Comparison was made to identify the most accurate detection and showed that 
CNN provides higher accuracy than ANN. Tisan et.al. [26] designed five different activation functions then a 
single neuron ANN was implemented using XSG for each of these activation functions. The resource 
consumption of the implementations was calculated and compared to show that lookup table activation 
provides minimum used gates. 

However, these previously mentioned works tried to employ NNs to detect breast cancer with reduced 
hardware components of FPGA. In this paper, the designed DNN detection accuracy was higher, and the 
hardware resources were less than those in the literature. This reduction was due to reducing the multiplications 
and additions processes per single neuron from NXxN to (N-1) x (N-1) addition and multiplication. A different 
number of bits and floating points were tested to allow us to increase the reduction of the used hardware 
component. 

The dataset used to design a NN to classify cancers as either benign or malignant depending on the 
characteristics of sample biopsies. Dataset has 699 cases 241 for malignant cases and 458 for benign cases. 
Thus containing 34.5% benign and 65.5% malignant. The dataset contains two variables: 

i) cancerInputs: with a 9x699 matrix defining nine attributes of 699 biopsies. These are: (clump thickness, 
uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial, cell size, bare nuclei, 
bland chomatin, normal nucleoli, and mitoses) 

ii) cancerTargets: with a 2x966 matrix where each column indicates a correct category with a one in either 
element | or element 2. These are:(benign and malignant) 
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Only 70 percent of the input samples are used in training, while the remaining 15 percent in testing, and 15 
percent for validation. 


2. RESEARCH METHOD 

In this paper, several steps need to be done to design a DNN with high detection accuracy and reduced 
hardware requirements by following several steps. Figure 2 shows these design steps. Each block in the 
flowchart represents a single step in the design and implementation of the proposed DNN. These steps are 
described in detail in the following steps. 


Detect the most appropriate number of hidden 
layers, input, hidden, output nodes using 
NNtoolbox 


Import training and testing 
data to Matlab 


Build the DNN in system 
generator 


Implement system over 
FPGA board 


Figure 2. Flowchart of proposed work 


i) Step one 

- Use MATLAB NN toolbox to detect the most suitable number of nodes in each layer, the number of hidden 
layers, and the optimum weight used in each neuron. NN simulator results showed the best results for a DNN 
with two fully connected hidden layers. The DNN input layer contains nine nodes according to the number 
of features in the dataset used for training. The first hidden layer contains ten neurons, and the second hidden 
layer contains eight neurons. Finally, the output layer with two neurons produces two outputs; one represents 
benign and the other for malignant detection. 

- Import the network weights to Matlab workspace to be used later in system generator. 

ii) Step two 

- Import the training datasets file ‘cancerInputs’ and ‘cancerTargets’ to MATLAB workspace, to test the DNN 
design accuracy in system generator. 

iii) Step three: 

- In MATLAB Simulink window, add system generator block and fill-in the required information such as 
FPGA board name and model, target working directory as in Figure 3. Also include the generate button which 
is used to convert the model to VHDL code. 

- Use the ‘from workspace’ block to read data from the workspace. All inputs are grouped in the ‘Inputs’ 
subsystems, as shown in Figure 4. All the inputs and weights are defined as ‘signed’ numbers with fixed- 
point precision. The reduction in the number of bits and binary points will affect the implementation of DNN 
design by reducing the required resources. Also, the use of fixed points precision will reduce the expenses of 
implementing floating points in hardware. All the subsystems use a mask configuration, the parameters of 
these masks are the number of bits and the binary points. Different values of these parameters were used to, 
as shown in Figure 5. 
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Figure 5. Mask for Subsystem parameters selection 


- Create a single neuron using ‘Multi’ and ‘addsub’ blocks. Each input is multiplied by weight using ‘Multi’ 
block then summed using ‘addsub’ block. The output of the final ‘addsub’ block is passed to the ‘FuncActiv’ 
subsystem, which represents the loglog activation function described in (4). ‘FunActiv’ subsystem final 
output has a value of either 0 or 1. Figure 6 shows the modeling of the loglog activation function. 

- Finally, join all these blocks together in a subsystem and call it ‘Nero’ as shown in Figure 7. In our design, 
the number of ‘Multi’ blocks will be less than the total number of inputs by 1. The same is true for the number 
of ‘addsub’ blocks. Use ‘gatewayIn’ for each input to the ‘Nero’ block. 
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Figure 7. Neuron subsystem 


Avoid using ‘gatewayOut’ blocks between each ‘Nero’ block or between subsystems to reduce the overall 
used hardware components. 
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- Duplicate ‘Nero’ subsystem to the number of neurons required in each hidden layer, complete the 
connections, and group all in subsystem call it ‘hidden layerl’, ‘hidden layer 2’ and ‘output layer 1’in 
Figure 4. The first hidden layers include ten fully connected neurons (Nero 1, Nero 2, ..., Nero 10). Each 
Nero subsystem will produce one output, and the ‘hidden Layer 1’ subsystem will produce ten outputs. As in 
Figure 8. These ten outputs represent the inputs to the ‘hidden layer 2’ subsystem. The second subsystem 
includes eight neurons (Nero1, Nero2, ..., Nero8) subsystems that represent the eight neurons as in Figure 9. 
‘Hidden layer 2’ subsystem will produce eight outputs which will be the inputs to the output layer subsystem. 
The output layer has two neurons (Nerol, Nero2) subsystems. It receives eight inputs from the previously 
hidden layer and will produce two outputs, as in Figure 10. 

- Use the testing data to check the accuracy of the implementation. Two ‘display' blocks were used to show 
classification results. The First for ‘benign’ and the second for ‘malignant’. If display! value ‘1’ and display2 
value ‘0’ then classifier indicates as ‘benign’ case. And if display! value ‘0’ and display2 shows ‘1’ then the 
case is classified as ‘malignant’. 

iv) Step four 

- Press the generate button to create the VHDL code. This code assembles the implemented model, as shown 
in Figure 3. Open saved VHDL code using ISE design suite 14.7. Synthesize the code and convert it to a .bit 
file to be uploaded to the Kintex-7 xc7k325t-3fbg676 board. 


3. RESULTS AND DISCUSSION 

After training the DNN using MATLAB toolbox for 72 epochs, a single epoch represents the entire 
pass of the training algorithm over all the training sets. Each epoch consists of 10.9219 iterations, where a 
single iteration is one step taken in the gradient descent algorithm towards minimizing the loss function using 
a mini-batch. The best performance in this training was 7e-4 with a total training time of 2.262 sec. The training 
is completed with a total of 786.375 iterations. Figure 11 shows the DNN performance of training. The DNN 
accuracy of training, testing, and validation were concluded using the confusion matrix: TP, TN, FP, and FN 
rates as shown in Table 1. 
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Table 1. Confusion matrix 


35 Target Class 

a Positive Negative 

62 Positive 65.5%(TP) 0.4% (FN) 
Negative 0% (FP) 34.0% (TN) 


The Accuracy according to the Table 1 and (5) and (6) is as: 


_ 344655 _ 5 

Accuracy = ~~~ = 99.6% (5) 
ee: P 

Precision = ———— = 100% (6) 


Another used measurement to test the performance of DNN is ROC. The closer the ROC curve to the 
northwest side of the window, the higher the accuracy of DNN gives higher accuracy since TP is larger than 
FP. Figure 12 views the NN ROC curve for training, testing, and validation. After generating the VHDL code 
from the system generator, Xilinx ISE 14.7 synthesizer was carried out to find space resources occupied by the 
VHDL. The result is displayed in the synthesizer report as given in Figure 13. The system was uploaded to the 
Kintex-7 xc7k325t-3fbg676 board. 


All ROC 
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False Positive Rate 


Figure 12. ROC 
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Number of Slice Registers 17 407600 0% 
Number of Slice LUTs | 48 203800 | 0% 
Number of fully used LUT-FF pairs | 16 49 | 32% 
Number of bonded IOBs | 117 400 | 29% 
Number of BUFG/BUFGCTRLs | 1 32| 3% 


Figure 13. Summary of used space on the FPGA 


To evidence the total equivalent gates used as a function of the number of bits allocated for hardware 
implementation of the DNN, an ISE report was generated for the hardware implementation. A resource 
utilization report was generated for the different number of bits and binary points for the inputs and weights. 
A comparison among them is shown in Table 2, which shows that 8 bits and 6 binary points give the minimum 
used space. 


Table 2. Device utilization of FPGA logic circuits for hardware implementation 


‘ Pree Used 

Logic Utilization (32,16) (16,12) (8,6) 
Number of Slice Registers 65 33 17 
Number of Slice LUTs 96 64 48 
Number of fully used LUT-FF pairs 64 32 16 
Number of bonded IOBs 385 206 112 
Number of BUFG/BUFGCTRLs 1 1 1 
Sum of required resources 611 336 194 


The authers concentrated on using the DNN design that provides the highest accuracy and reduced 
hardware components. Firstly, using NN toolbox to test different designs then test the performance of each of 
them. The design that provides the best accuracy was chosen to be implemented in the system generator to be 
converted to VHDL code and tested over an FPGA device. The weights of DNN were imported from the NN 
toolbox to be used in the XSG. The results provided show the design can accurately classify benign from 
malignant cases with mean square error (MSE) equals 7e-4 after 72 training epochs and 786.375 iterations. A 
comparison with literary works in the same field is in Table 3. 


Table 3. Comparison with ANN for breast cancer detection 


Ref No. Size of network (I-H-O) _ Accuracy (%) 
Singhal and Pareek [22] 9-6-2 97.8495 
Janghel et.al. [21] 9-15-1 95.82 
Shukla et.al. [23] 40(35-5) 92 
Dabeer et.al. [27] CNN 93.45 
Our work 9-10-8-2 99.6 


The results of FPGA implementation showed the lowest resources requirement with 194 logical 
resource utilization in the case of 8 bits and 6 floating points. Another comparison with researchers 
implementing NNs with 8 bits and 6 binary points is shown in Table 4. The resource usage and MSE of the 
implementation are compared with literature to show that this work has less resource usage than literature 
implementations with less or comparable MSE. 


Table 4. Comparison with ANN resource usage and MSE 


Ref No Resource usage (8,6) MSE 
Aguiar et.al. [3] 500 0.2857 
Khodja et.al. [4] 8820 1.666 e-7 
Tisan et.al. [26] 877 111 

Our work 194 Te-4 
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4. CONCLUSION 

DNN was designed to find the difference between benign and malignant tumors and successfully 
diagnose breast cancer using two hidden layers. The DNN design required fewer adders and multipliers than 
used in previous papers. The total reduction was 20 adders and 20 multipliers. The Proposed DNN achieved 
an accuracy rate of 99.6 percent, with a 1.75 percent increase over other similar projects with hardware 
implementation of the DNN using the XSG. The error rate implementation using (8,6) scenario was 7e-4 with 
a difference of 28 percent less than similar projects and using 306 fewer hardware components, which 
represents 30 percent of the FPGA device space. In conclusion, the proposed DNN design successfully reduced 
the hardware space utilization on FPGA devices while achieving a higher accuracy rate. 
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